Characterization of ILP Distribution for NASA NAS Parallel Benchmarks
نویسنده
چکیده
A characterization study of analyzing dynamic instruction traces to characterize program parallelism is conducted. This study supports that the experimental design of supercomputer and parallel computers calls for quantifiable methods to evaluate the requirements of different workloads within an application domain. Such methods can help establish the basis for scientific design of parallel computers driven by application needs, to optimize performance to cost. In addition, the application characteristics can be used early in the design process to identify bottlenecks such as not having enough resources, not having enough parallelism in the instruction stream, or using a too restrictive scope of concurrency detection. The selection of which features to include in a new computer system depends on the needs of the workloads the system will execute. The number and type of functional units are among the most important design decisions. Therefore, this paper presents an instruction-level characterization for analyzing dynamic traces using a trace-driven simulator. It investigates the parallel system needs for a class of contemporary benchmarks taken from NASA/NAS Parallel Benchmarks (NPB) suite. The NPB represents an implementation independent problem set, representative of Computational Aeroscience workload computations. The NPB workloads have been implemented on nearly every parallel platform and results have been reported by the vendors. Data is presented for NBP requirements of these resources. The requirements suggest upper limits on the resources needed for efficient processors. In this study, we also examine non-uniformities in the distribution of instruction-level parallelism. Several nonuniformities in instruction-level parallelism are investigated including variation between benchmark class and by instruction class within benchmark. In addition, the average instruction class distribution as well as the shortest path a workload would be executed on a parallel machine will be shown. The results confirm that workloads in NPB represent a wide range of non-redundant applications with different characteristics.
منابع مشابه
The NAS Parallel Benchmarks 2.0
We describe a set of implementations of the NAS Parallel Benchmarks based on Fortran 77 and the MPI message passing standard. These implementations, which are intended to be run with little or no tuning, approximate the performance a typical user can expect for a portable parallel program on a distributed memory computer. They complement rather than replace the original NAS Parallel Benchmarks....
متن کاملNAS Experience with the Cray X1
A Cray X1 computer system was installed at the NASA Advanced Supercomputing (NAS) facility at NASA Ames Research Center in 2004. An evaluation study of this unique high performance computing (HPC) architecture, from the standpoints of processor and system performance, ease of use, and production computing readiness tailored to the needs of the NAS scientific community, was recently completed. T...
متن کاملThe NAS Parallel Benchmarks 2.1 Results
We present performance results for version 2.1 of the NAS Parallel Benchmarks (NPB) on the following architectures: • IBM SP2/66 MHz • SGI Power Challenge Array/90 MHz • Cray Research T3D • Intel Paragon "MILl, Inc. This work is supported through NASA Contract NAS 2-14303. tNASA Ames Research Center, Moffett Field, CA, 94035-1000. tSterling Software, Palo Alto, CA. This work is supported throug...
متن کاملTitle: the Nas Parallel Benchmarks
DEFINITION: The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers [?]. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-perf...
متن کاملA Detailed Performance Characterization of Columbia using Aeronautics Benchmarks and Applications
Columbia is a 10,240-processor supercluster consisting of 20 Altix nodes with 512 processors each, and currently ranked as one of the fastest computers in the world. In this paper, we investigate its suitability as a capability computing platform for aeronautics applications. We present the performance characteristics of Columbia obtained on up to eight computing nodes interconnected via the In...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003